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Abstract 

We present a system for recognising human activity given a sym- 
bolic representation of video content. The input of our system is a set 
of time-stamped short-term activities (STA) detected on video frames. 
The output is a set of recognised long-term activities (LTA), which 
are pre-defined temporal combinations of STA. The constraints on the 
STA that, if satisfied, lead to the recognition of a LTA, have been ex- 
pressed using a dialect of the Event Calculus. In order to handle the 
uncertainty that naturally occurs in human activity recognition, wc 
adapted this dialect to a state-of-the-art probabilistic logic program- 
ming framework. We present a detailed evaluation and comparison of 
the crisp and probabilistic approaches through experimentation on a 
benchmark datasct of human surveillance videos. 

1 Introduction 

A common approach to human activity recognition separates low-level from 
high-level recognition. The output of the former type of recognition is a set 
of activities taking place in a short period of time: 'short-term activities' 
(STA). The output of the latter type of recognition is a set of 'long-term 
activities' (LTA), which are temporal combinations of STA. We focus on 
high-level recognition. 

We define a set of LTA of interest, such as 'fighting' and 'meeting', as 
temporal combinations of STA such as 'walking', 'running', and 'inactive' 
(standing still) using a logic programming (Prolog) implementation of the 
Event Calculus (EC) [2T]. We employ EC to express the temporal constraints 
on a set of STA that, if satisfied, lead to the recognition of a LTA. 

In earlier work [3j we identified various types of uncertainty that exist in 
activity recognition, such as erroneous STA detection. To address this issue, 
we extend our work by presenting Prob-EC, an EC dialect suitable for proba- 
bilistic activity recognition. Prob-EC operates on the state-of-the-art proba- 



bilistic logic programming framework ProbLog [TO]. Prob-EC, therefore, may 
operate in settings where STA occurrences are assigned a confidence value 
by the underlying low-level tracking system (such as, for example, a proba- 
bilistic classifier) . We present extensive experimental evaluation of Prob-EC 
on a benchmark activity recognition dataset. The evaluation demonstrates 
the conditions in which Prob-EC outperforms our previous EC dialect - 
Crisp-EC. Prob-EC is the first EC dialect able to deal with uncertainty in 
the input STA. Moreover, this is the first approach that thoroughly evaluates 
EC in a probabilistic framework. The full code of Prob-EC, along with the 
dataset on which experimentation is performed, is directly available from 
the authors. 

The remainder of the paper is organised as follows. In the following sec- 
tion we set our work in context. In Section [3] we present Crisp-EC while in 
Sections |4] and [5] we describe, respectively, the dataset on which we perform 
activity recognition and the corresponding knowledge base of LTA defini- 
tions. Section [6] briefly introduces ProbLog while Section [7] describes Prob- 
EC. Our experimental results, including the comparison between Prob-EC 
and Crisp-EC, are presented in Section [8] Finally, in Section [9] we summarise 
our observations and outline directions for further research. 

2 Related Work 

Numerous recognition systems have been proposed in the literature. In this 
section we focus on long-term activity (LTA) recognition systems that, sim- 
ilar to our approach, exhibit a formal, declarative semantics. 

A fair amount of recognition systems is logic-based. Notable approaches 
include the Chronicle Recognition System [11], the Event Calculus dialect 
of Paschke et al. [28, 271 and the hierarchical event representation of Hakeen 
and Shah [15J. Three recent reviews of logic-based recognition systems may 
be found in [101 1291 [2] . These systems are common in that they employ logic- 
based methods for representation and inference, but are unable to handle 
noise. 

Shet et al. [3H [35] have presented a logic programming approach to ac- 
tivity recognition which touches upon the issue of data coming from noisy 
sensors. In that work, LTA concerning theft, entry violation, unattended 
packages, and so on, have been defined. Within their activity recognition 
system, Shet and colleagues have incorporated mechanisms for reasoning 
over rules and facts that have an uncertainty value attached. Uncertainty 
in rules corresponds to a measure of rule reliability. On the other hand, 
uncertainty in facts represents the detection probabilities of the short-term 
activities (STA). In the VidMAP system [35], a mid- level module which 
generates Prolog facts automatically 'filters out' data that a low-level image 
processing module has misclassified (such as a tree mistaken for a human). 
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Shet and colleagues have noted of the 'filtering' carried out by this module 
that '...it does so by observing whether or not the object has been persis- 
tently tracked' [Ml P- 2]. In [35], an algebraic data structure known as a 
bilattice [13J is used to detect human entities based on uncertain output 
of part-based detectors, such as head or leg detectors. The bilattice struc- 
ture associates every STA or LTA with two uncertainty values, one encoding 
available information and the other encoding confidence. The more confident 
information is provided, the more probable the respective LTA becomes. 

Probabilistic graphical models have been applied on a variety of activ- 
ity recognition applications where uncertainty exists. Activity recognition 
requires processing streams of timestamped STA and, therefore, numerous 
activity recognition methods are based on sequential variants of probabilistic 
graphical models, such as Hidden Markov Models |30| . Dynamic Bayesian 
Networks [UJ and Conditional Random Fields [22J. Compared to logic-based 
methods, graphical models can naturally handle uncertainty but their propo- 
sitional structure provides limited representation capabilities. To model LTA 
that involve a large number of relations among STA, such as interactions 
between multiple persons and/or objects, the structure of the model may 
become prohibitively large and complex. To overcome such limitations, these 
models have been extended in order to support more complex relations. Ex- 
amples of such extensions include representing interactions involving mul- 
tiple domain objects O HH H2J [39], capturing long-term dependencies be- 
tween states [T7] , as well as describing a hierarchical composition of activities 
|26|, 123] . However, the lack of a formal representation language makes the 
definition of complex LTA complicated and the integration of domain back- 
ground knowledge very hard. 

Markov Logic Networks (MLN) [31] have also been used for representing 
uncertainty in activity recognition. MLN employ first-order logic represen- 
tation, where each formula may be associated with a weight, indicating the 
confidence we have on the formula. The knowledge base of weighted formu- 
las is translated into a Markov network where probabilistic inference is per- 
formed. The approach of Biswas et al. [5J, for example, uses MLN to recog- 
nise LTA given the STA that have been observed by low- level classifiers. A 
more expressive approach that can represent persistent and concurrent LTA, 
as well as their starting and ending time-points, is proposed by Helaoui et 
al. |16| . The method of Sadilek and Kautz [32] employs hybrid-MLN [JT] 
in order to recognise successful and failed interactions between multiple hu- 
mans using noisy location data. Similar to pure MLN-based methods, the 
knowledge base is composed by LTA definitions. However, hybrid formulas 
aiming to remove the noise from the location data are also included. Hybrid 
formulas are defined as normal formulas, but their weights are also associ- 
ated with a real-valued function, such as the distance of two persons. As a 
result, the confidence of the formula is defined by both its weight and func- 
tion. Although these methods incorporate first-order logic representation, 
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the presented LTA definitions have a limited temporal representation — for 
instance, temporal constraints are defined over successive instants of time. 

A method that uses interval-based temporal relations is proposed in [21] . 
The aim of the method is to determine the most consistent sequence of LTA 
based on the observations of low- level classifiers. Similar to \38\ I18j . the 
method uses MLN to express LTA using common sense rules. In contrast to 
|38|, 1 18j . it employs temporal relations based on Allen's Interval Algebra (IA) 
p] . In order to avoid the combinatorial explosion of possible intervals that IA 
may produce, a bottom-up process eliminates the unlikely LTA hypotheses. 
In [2 [33] a probabilistic extension of Event Logic [36] is proposed in order 
to perform interval-based activity recognition. Similar to MLN, the method 
defines a probabilistic model from a set of weighted LTA. However, the Event 
Logic representation avoids the enumeration of all possible interval relations. 

The main difference of our approach with respect to the aforementioned 
lines of work concerns the fact that we use the Event Calculus (EC) for 
temporal representation and reasoning. EC has built-in rules for complex 
temporal representation, including the formalisation of inertia, which help 
considerably the system designer in developing activity definitions. With the 
use of EC one may develop intuitive, succinct activity definitions, facilitating 
the interaction between activity definition developer and domain expert, 
and allowing for code maintenance. Furthermore, being logic programming- 
based, its translation to a probabilistic logic programming framework such 
as ProbLog is a straightforward process. To the best of our knowledge, the 
probabilistic EC dialect presented in this paper, Prob-EC, is the first EC 
dialect able to deal with uncertainty in STA detection. 

A MLN-based approach that is complementary to our work is that of 
Skarlatidis et al. [37], which introduces a probabilistic EC dialect based on 
MLN. This dialect and Prob-EC tackle the problem of probabilistic infer- 
ence from different viewpoints. Prob-EC handles noise in the input stream, 
represented as detection probabilities of the STA. The MLN-based EC di- 
alect, on the other hand, emphasises uncertainty in activity definitions in 
the form of rule weights. 

ProbLog and MLN are closely related. A notable difference between them 
is that MLN, as an extension of first-order logic, are not bound by the 
closed-world assumption. There exists a body of work that investigates the 
connection between the two frameworks. Bruynooghe et al. [5], for example, 
developed an extension of ProbLog which is able to handle first-order for- 
mulas with weighted constraints. Fierrens et al. [12] converted probabilistic 
logic programs to ground MLN and then used state-of-the-art MLN inference 
algorithms to perform inference on the transformed programs. 
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Predicate 



Meaning 



happensAt(i?, T) 
initially(.F = V) 
holdsAt(F = y, T) 
initiatedAt(F = V, T) 



Event E is occurring at time T 
The value of fluent F is V at time 
The value of fluent F is V at T 
At time T a period of time 
for which F = V is initiated 
At time T a period of time 
for which F = V is terminated 



terminatedAt(F = T) 



Table 1: Main Predicates of Crisp-EC. 



3 The Event Calculus 



Our LTA recognition system is based on a logic programming (Prolog) im- 
plementation of an EC dialect. EC, introduced in [21], is a many-sorted, 
first-order predicate calculus for representing and reasoning about events 
and their effects. For the dialect presented here — Crisp-EC — the time 
model is linear and may include real numbers or integers. Where F is a flu- 
ent — a property that is allowed to have different values at different points 
in time — the term F = V denotes that fluent F has value V. Boolean fluents 
are a special case in which the possible values are true and false. Informally, 
F = V holds at a particular time-point if F = V has been initiated by an 
event at some earlier time-point, and not terminated by another event in 
the meantime. 

We represent STA as events and LTA as fluents. In this way, we can ex- 
press the conditions in which the occurrence of a STA initiates or terminates 
a LTA. 

An event description in Crisp-EC includes rules that define, among other 
things, the event occurrences (with the use of the happensAt predicate), the 
effects of events (with the use of the initiatedAt and terminatedAt predicates), 
and the values of the fluents (with the use of the initially and holdsAt pred- 
icates). Table [T] summarises the main predicates of Crisp-EC. Variables, 
starting with an upper-case letter, are assumed to be universally quantified 
unless otherwise indicated. Predicates, function symbols and constants start 
with a lower-case letter. 

The domain-independent rules for holdsAt can be written in the following 
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(1) 



(3) 



form: 

holdsAt(F = V, T)<- 
initiatedAt(F = V, T s ), 
T S <T, 

not broken(F = T/, T s , T) 

broken(F = y, T s , T) <r- 

terminatedAt(F = Tf), (2) 
T s <T f <T 

broken (F = Vi, T s , T) <- 
initiatedAt(F = V 2 , Tf), 

Vi ± V 2 , 
T s <T f <T 

not in rule represents 'negation by failure', which provides a form of 
default persistence — 'inertia' — of fluents. According to rule 0, F=V 
holds at time-point T if the fluent F has been initiated to value V at an 
earlier time T s , and has not been broken since. According to rule Q, a period 
of time for which F = V holds is broken at Tf if F = V is terminated at Tf. 
Rule Q dictates that if F = V 2 is initiated at Tf then effectively F = V\ is 
terminated at Tf, for all other possible values V\ of F. Rule ^ therefore 
ensures that a fluent cannot have more than one value at any time. We do 
not insist that a fluent must have a value at every time-point. In Crisp- 
EC there is a difference between initiating a Boolean fluent F = false and 
terminating F = true: the first implies, but is not implied by, the second. 

According to rules 0-([3]), F = V does not hold at the time it was initi- 
ated, while it holds at the time it was terminated. 

Besides the domain-independent rule initiatedAt(i ? = V, 0) initially^ = V), 
the definitions of initiatedAt and terminatedAt are domain-specific. One com- 
mon form of rule for initiatedAt is the following: 

initiatedAt(F = F, T) <- 

happensAt(£, T), (4) 

Conditions [T] 

where Conditions[T] is some set of further conditions referring to time-point 
T. 

terminatedAt rules are handled similarly. In Section[5]we illustrate the use 
of initiatedAt and terminatedAt rules for expressing LTA definitions. 

4 Short-Term Activities 

In this paper we use the first dataset of the CAVIAR project)]] to perform 
LTA recognition. This dataset includes 28 surveillance videos of a public 
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space. The videos are staged — actors walk around, sit down, meet one 
another, leave objects behind, fight, and so on. Each video has been manually 
annotated by the CAVIAR team in order to provide the ground truth for 
both STA and LTA. For this set of experiments, the input to our recognition 
system is: 

(i) The STA walking, running, active (non-abrupt body movement in the 
same position) and inactive (standing still), together with their time- 
stamps, that is, the video frame in which that STA took place. All of 
these activities are mutually exclusive and are represented by means 
of the happensAt predicate. For example, happer\sAt(active(idg) , 15560) 
expresses that id$ displayed 'active' bodily movement at time-point 
15560. STA are represented as instantaneous events in EC in order to 
use the initiatedAt and terminatedAt predicates to express the conditions 
in which these activities initiate and terminate a LTA. 

(ii) The coordinates of the tracked people and objects as pixel positions at 
each time-point, as well as their orientation. The coordinates are repre- 
sented with the use of the holdsAt predicate. holdsAt(coord(i(ig) =(14, 
55), 10600), for example, expresses that the coordinates of id% are 
(14,55) at time-point 10600. Orientation is also encoded using the 
holdsAt predicate. For instance, holdsAt( orientation(id2) = 120, 10600) 
expresses that, in the two-dimensional projection of the video, the 
same person was forming a 120° angle with the x-axis at the same 
time-point. This type of information is necessary for computing the 
distance between two entities as well as the direction to which a per- 
son might be headed. 

(iii) The first and the last time a person or object is tracked ('appears' and 
'disappears'). This type of input is represented using the happensAt 
predicate. For example, happens At(appear (id 10), 300) expresses that 
idio is first tracked at time-point 300. 

Given such input, Crisp-EC recognises the following LTA: a person leav- 
ing an object, people meeting, moving together, or fighting. Long-term activ- 
ities are represented as EC fluents. For instance, ho\ds At(moving (id 1 , ids) = 
true, 140) states that id\ was moving together with ids at time-point 140. 

LTA recognition is based on a knowledge base of LTA definitions ex- 
pressed in terms of initiatedAt and terminatedAt. In the next section, we present 
example definition fragments of the LTA knowledge base. 
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5 Long-Term Activity Definitions 



The 'leaving an object' activity is denned as follows: 



\n\t\atedAt(leaving-object(P, Obj)= true, T) <— 
happensAt(appear(Obj), T), 
happensAt(inactive(Obj), T), 
ho\dsAt(close(P, Obj, 50)=true, T), 
ho\dsAt(person(P) = true, T) 



(5) 



term\natedAt(leaving-object(P, 06j)=true, T) 
happensAt(disappear(C%), T) 



(6) 



In the CAVIAR videos an object carried by a person is not tracked — only 
the person that carries it is tracked. The object will be tracked ('appear') if 
and only if the person leaves it somewhere. Moreover, objects (as opposed to 
persons) can only exhibit the inactive STA. Accordingly, rule ^ expresses 
the conditions in which 'leaving an object' is recognised. The fluent recording 
this activity, leaving .object (P , Obj), becomes true at time T if Obj 'appears' 
at T, its STA at T is 'inactive', and there is a person P 'close' to Obj at T. 
The close{IDl , ID2 , Threshold) fluent expresses that the distance between 
ID1 and ID2 is at most Threshold pixels. This fluent is defined as follows: 



The distance between two tracked objects/people is computed as the eu- 
clidean distance between their coordinates in the two-dimensional projection 
of the video — recall that the coordinates of each tracked entity are given 
as input to our system. 

The 30 pixel distance threshold in rule ^ was determined from an em- 
pirical analysis of the CAVIAR dataset. 

An object that is picked up by someone is no longer tracked (it 'disap- 
pears') which in turn terminates leaving .object — see rule @. 

In CAVIAR there is no explicit information that a tracked entity is a 
person or an inanimate object. Therefore, in our activity definitions we try to 
deduce whether a tracked entity is a person or an object given the detected 
STA. We defined the fluent person(P) to have value true if P has exhibited 



Uo\dsAt(close(IDl , ID2, Threshold) = true, T) <- 
holdsAt( distance (ID1, ID2) = Dist, T), 
Dist < Threshold 



(7) 
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a walking, active or running STA at some time-point since P 'appeared'. 



initiatedAt(perscm(P) = true, T) <— 

happensAt(walking(P), T ) 
initiatedAt(person(P) = true, T) 

happensAt( active (P), T) , . 

initiatedAt(person(P) = true, T) 

happensAt(r?mronc/(P), T) 
terminatedAt(|9erson(P) = true, T) <— 

happensAt( disappear (P), T) 

The value of person(P) is time-dependent because in CAVIAR the identifier 
P of a tracked entity that 'disappears' (is no longer tracked) at some point 
may be used later to refer to another entity that 'appears' (becomes tracked), 
and that other entity may not necessarily be a person. Note, finally, that 
rule ([5]) incorporates a (reasonable) simplifying assumption, that a person 
entity will never exhibit 'inactive' activity at the moment it first 'appears' 
(is tracked). If an entity is 'inactive' at the moment it 'appears' it can be 
assumed to be an object, as in the first two conditions of rule 

In a similar way, we may express the definitions of other LTA. The use 
of EC, in combination with the full power of logic programming, allows 
us to express LTA definitions including complex temporal, spatial or other 
constraints. Below we present fragments of the remaining LTA definitions. 

meeting (of two persons Pi and Pg) is recognised when two people 'in- 
teract': at least one of them is active or inactive, the other is not running, 
and the distance between them is at most 25 pixels (all numeric constraints 
were determined from an empirical analysis of the dataset). In CAVIAR, this 
interaction phase can be seen as some form of greeting, such as a handshake. 
Rules ([9]) and ( 10 ) show the conditions in which meeting is initiated: 



initiatedAt(mee£ing(Pj , Pg)=true, T) <— 
happensAt(inactive(Pi), T), 
holdsAt(dose(Pj, P 2 , 25) = true, T), 
ho\dsAt(person(P 1 )=true, T), (9) 
ho\ds At(person(P 2) =true, T), 
not happensAt(rtinmng(P ; g), T), 
not happensAt(ac£zt;e(P2), T) 

\n\t\ated At(meeting (Pi , Pg)=true, T) 
happensAt( active (Pi ), T), 

ho\dsAt(close(P u P 2 , 25)= true, T), (10) 
ho\ds At(person(P 2) =true, T), 
not happensAt(runmng(Pg), T) 

meeting is terminated by a plethora of conditions, such as when one of 
the two people involved in the LTA starts running or 'disappears'. 
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The activity moving was defined in order to recognise whether two people 
are walking along together: 



\ri\t\atedAt(moving(Pi , Pg) = true, T) <— 
happensAt(walking(Pi), T), 
happensAt(w alking(P 2 ) , T), 
holdsAt(c/ose(Pi, P 2 , 34) = true, T), 
holds At(orientation (Pj ) = 0r% , T), 
holdsAt(orzenta£zon(P; ) = 0r 2 , T), 
\On - 0r 2 \ < 45 



(11) 



In order to recognise moving, both people involved have to be walking while 
being close to each other. In addition, they have to be facing towards, more 
or less, the same direction (people walking in opposite directions are not 
assumed to be walking along together) . This is accomplished by constraining 
their orientations so that they are, roughly, headed towards the same area 
while they are walking. 

LTA — in contrast to STA — are not mutually exclusive. For exam- 
ple, meeting may overlap with moving: two people interact and then start 
moving, that is, they walk while being close to each other. In general, how- 
ever, there is no fixed relationship between LTA. 

moving is terminated when either person walks away from the other with 
respect to the predefined threshold of 34 pixels: 



Other termination conditions for moving include either person running 
away from the other, as well as either person 'disappearing' from the scene. 

The last definition concerns the fighting activity. In earlier experiments 
[3], we formalised fighting in terms of the running and active STA. This 
formalisation resulted to very poor recognition accuracy. This was mainly 
due to the limited CAVIAR STA dictionary. To overcome this problem, we 
manually edited the CAVIAR annotation by introducing the STA abrupt, 
which describes a person swaying his arms or legs violently. This is a form of 
STA that can be tracked by several state-of-the-art STA detection systems, 
such as the one presented in |20| . In this way, we may formalise fighting as 
follows: 



To recognise fighting, we require that both people are sufficiently close and 
at least one of them moves abruptly, while the other one is not inactive, 



terminated At(moving (Pi , Pg)=true, T) 
\nappensAt(walking(Pi), T), 
ho\dsAt(close(Pi, P 2 , 3-^)= false, T) 



(12) 



\n\t\ated At(fighting (Pi , P 2 )=true, T) <— 
happensAt(abrupt(P 1) , T), 
holdsAt( close (Pi , P 2 , 44) =twe, T), 
not nappensAt(inactive(P 2) , T) 



(13) 
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indicating that he ought to be participating in the fight somehow, fighting 
ceases to be recognised when either person involved in the activity starts 
walking or running away, or exits the scene. 



6 A Probabilistic Logic Programming Framework 

In this section we briefly present ProbLog [19], a probabilistic extension of 
the logic programming language Prolog. ProbLog differs from Prolog in that 
it allows for probabilistic facts, which are facts of the form pi :: fi. In the 
expression pi :: fi, pi is a real number in the range [0, 1] and fi is a Prolog 
fact. If fi is not ground, then the probability pi is applied to all possible 
groundings of ft. Classic Prolog facts are silently given probability 1. 

Probabilistic facts in a ProbLog program represent random variables. 
Furthermore, ProbLog makes an independence assumption on these vari- 
ables. This means that a rule which is defined as a conjunction of n of these 
probabilistic facts has a probability equal to the product of the probabilities 
of these facts. When a predicate appears in the head of more than one rule 
then its probability is computed by calculating the probability of the im- 
plicit disjunction created by the multiple rules. For example, for a predicate 
p with two rules p <— li and p <— lg,ls, the probability P(p) is computed as 
follows: 

P(p) = P((p<-h) V(p<-/ 2 ,/ 3 )) = 

= P(p <- h) + P(p <- l 2 , k) - P((p <- h) A (p <— h, k)) = 
= P{h) + P{1 2 ) x P(l 3 ) - P(h) x P{1 2 ) x P(l 3 ) 

Given the independence assumption, any subprogram L of the program's 
Herbrand Base has a probability equal to: 

p(L)= n Pi - iw (u) 



With the help of equation (14), one could compute the probability that a 
query q holds in a ProbLog program — success probability — by summing 
the probabilities of all subprograms that entail it: 

P s (q) = £ P{L) (15) 

L\=q 



Computing the success probability through equation (15), however, is com- 
putationally infeasible for large programs, since it involves summing through 
an exponential number of summands (2' Bi ' different subprograms, where Bi 



is the Herbrand Base). By combining equations (14) and (15) and eliminat 



ing redundant terms, we end up with the following characterisation: 

Ps(q) = P( V Mi) (i 6 ) 

eeProofs(q) /,£e v ' 
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That is, the task of computing the success probability of a query q is trans- 
formed into the task of computing the probability of the Disjunctive Normal 
Form (DNF) formula of equation (16). Practically, equation (16) expresses 
that the success probability of query q is equal to the probability that at 
least one of its proofs is sampled. This, unfortunately, is not a question 
of straightforwardly transforming the probability of the DNF to a sum of 
products. Every conjunction (proof) f\ f{ in equation (16) expresses the 

probability of a possible world that entails the proof. There are numerous 
such worlds possible, including those that entail other proofs as well. Con- 
sequently, if we were to translate equation (16) to a sum of products, we 
would assume that all different proofs are disjoint (meaning that they repre- 
sent mutually exclusive worlds), which does not hold in the general case. In 
order to make the proofs disjoint, one would have to enhance every conjunc- 
tion with negative literals, in order to exclude worlds whose probability has 
already been computed in previous conjunctions of the DNF. This problem 
is known as the disjoint- sum "problem and is known to be #P-hard [10] . 

ProbLog's approach consists of using Binary Decision Diagrams (BDDs) 
[S] to compactly represent the DNF of equation (16). A BDD is a binary de- 
cision tree with redundant nodes removed and isomorphic subtrees merged. 
The BDD nodes represent the probabilistic facts of the ProbLog program. 
Every node has a 'positive' and 'negative' outward edge, leading to either a 
child node or the special 'true' or 'false' terminal nodes. The positive out- 
ward edge of the BDD node is labelled with the probability of the respective 
probabilistic fact and the negative edge is labelled with the complement of 
that probability. Positive and negative edges represent distinct decisions on 
inclusion of the relevant fact in the currently sampled possible world; a pos- 
itive edge signifies that the fact represented by its parent node is included 
in the sample with the labelled probability, whereas a negative edge signifies 
that the fact is not included in the sample with the complement of the same 
probability. Therefore, by following a path from the root node to the 'true' 
terminal node, one could sample a conjunction of the DNF formula of equa- 
tion (16). The 'negative' outward edges offer a compact representation of 
the negated literals required to enhance the DNF formula in order to make 
it represent a disjunction over disjoint conjunctions. 

ProbLog inference can be summarised in three general steps. The first 
step is to gather all proofs of the query q by scanning the Selective Linear 
Definite (SLD) tree of proofs and represent them as the DNF formula of 
equation (16). Afterwards, with the help of a built-in translation script, the 
DNF is translated to a BDD. Finally, the probability of this BDD is com- 
puted recursively, starting from the root node and assuming a probability 
of 1 for the 'true' terminal and for the 'false' terminal. 

With the help of BDDs, ProbLog inference is able to scale to queries 
containing thousands of different proofs [19 1. ProbLog's efficiency, as well as 
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the straightforward adaptation of EC to it, were the driving forces behind 
our decision to use this framework for activity recognition under uncertainty. 

Another element of interest in ProbLog is its approach to negation. 
'Negation as failure', expressed in Prolog by not, has similar semantics in 
ProbLog. ProbLog, after all, is bound by the closed-world assumption as 
well: if query q is unprovable in the logical part of the knowledge base, then 
not q has a probability of 1. However, if q is provable, even by a small proba- 
bility of, say, 0.01, then the probability of not q will be 0. Negation as failure 
is not interested in probabilities; it merely checks to see whether the query 
is provable in the logical part of the program. 

In order to compute the complement of the probability of a probabilistic 
fact, ProbLog provides the built-in predicate problog.not. For any probabilis- 
tic fact pi :: fi, we have that: 

P s (problog_not(/ 4 )) = 1 - P s (f t ) = 1- Pi (17) 

The built-in predicate problog.neg is an extension of problog_not applicable 
to both probabilistic facts and rules. For a rule r, we have: 

P s (problog_neg(r)) = 1 - P s {r) (18) 

Finally, 'cuts' (!/0) should be used with care by the ProbLog program- 
mer. This is because the nature of cuts causes certain proofs of a query to 
not be sought after, usually for efficiency purposes. This means, however, 
that ProbLog will not be able to collect the entire set of proofs for the query 



of interest and thus the DNF of equation (16) will represent a subset of the 
total proof set. 

ProbLog has been fully integrated in the YAP Prolog systemj^] Further 
details, examples and code samples are available in [19] and on the ProbLog 
websiteJH 



7 The Event Calculus in ProbLog 

In this section, we present the necessary transformations in order to make 
Crisp-EC ProbLog-compatible — the result is Prob-EC. We also explain the 
inference procedure of Prob-EC through two examples, in order to provide 
some intuition into how the framework operates. 

7.1 Transformation 

To express our EC dialect in ProbLog we had to do some minor tweaks 
with respect to negation. problog_not can only be used on probabilistic facts 

^http : //www . dec . f c .up .pt/~vsc/Yap/ 
http : //dtai . cs .kuleuven.be/problog/ 
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that are part of the knowledge base. For facts that do not belong to the 
knowledge base, such as a STA that is not part of an input STA stream, 
problog.not fails silently and reports a probability of 0. We had to overcome 
this issue because our LTA definitions (see Section [5]) demand that we pro- 
duce a probability of 1 whenever the 'negated' STA is not detected, but 
also produce the complement of the probability of the STA whenever it is 
detected. Similarly to problog.not, problog_neg cannot be safely used on goals 
that are not inferable from the logical part of the knowledge base. In order 
to overcome the negation-related issues, we defined the negatei and negate2 
predicates in Prob-EC: 

negate} (Fact) <— problog_not(Pac£) , , 

negate 1 (Fact) <— not Fact 

negate 2 (Goal) <— problog_neg( Goal) , , 

negate 2 ( Goal) <— not Goal 

We were thus able to translate our initiation conditions into a probabilistic 
format by combining the properties of both negation as failure and proba- 



bilistic negation. For instance, rule ( 13 ) can now be written as: 



\ri\t\atedAt(fighting(Pi , Pg)=true, T) <— 
happensAt(abrupt(P}), T), 
holdsAt( close (Pi , P 2 , 44) =true, T), 
negate^(happensAt(mactwe(Pg), T)) 



(21) 



With rule (21) we are able to produce a probability of 1 whenever 
happensAt(inac£u>e(Pg) , T) is not part of our input, as well as produce the 
complement of its probability whenever it is part of our input with a proba- 
bility value attached. The only case when negatei will produce a probability 
of is when the probability value of its argument is 1. This is the desired 
behaviour. 

In addition to initiatedAt and terminatedAt rules, we also have to produce 
correct probabilities for holdsAt rules. Rule ([!]) uses negation as failure to 
declare that a fluent holds its value until it is broken by some termination 
condition. In Prob-EC, termination conditions might also have a probability 
attached. We would therefore like to use this probability in order to affect 
the final probability of our holdsAt queries. To achieve this, we transformed 
the domain-independent rule for holdsAt into the following form: 

holdsAt(F = V, T) •<— 
initiatedAt(F = y, T s ), 

T s < T, [2Z) 
negate 2 (broken (F = V, T s , T)) 



According to rule (22), the probability that F = V holds at time-point T is 
the probability that at least one, among numerous possible, initiation condi- 
tions have occurred in the past, also taking into account the probability that 
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Figure 1: Example of LTA probability fluctuation in the presence of weak 
initiation. 



the fluent might have been terminated in the meantime. The use of negate2 
in our domain-independent holdsAt rule ensures that, if the termination con- 
ditions affecting a given fluent F have certain probabilities, the probability 
that F = V holds at time T will not drop to immediately, but rather by a 
certain probability measure. 

7.2 Inference 

To demonstrate a typical Prob-EC inference task, suppose that two people, 
mike and sarah, are engaging in a 'moving' activity for a number of video 
frames — see Figure [I] The activity is first initiated at frame number 1, when 
both mike and sarah start walking. At frame 2 sarah stops walking {walking is 
required by rule ( fTTj ) to initiate moving). She instead displays active body 
movement. At frame 21 sarah resumes walking, once again initiating moving. 
At frame 41 sarah continues walking, but mike is inactive and is therefore 
left behind. Consequently, sarah and mike are no longer close enough to each 



other, which triggers the termination condition ( 12 ) of moving ('walk away'). 
For simplicity, let all information pertaining to orientation and coordinates 
(input type (ii) of Section [4]) be crisp (probability of 1). Moreover, assume 
that the STA walking, active and inactive have probabilities attached, as 
follows: 



0.70 

046 

0.73 
0.55 



happensAt(u>a/£;2ng(mike), 1) 
happensAt(u>aZ/c2ng(sarah), 1 
happensAt(wa//cmg(mike), 2) 
happensAt(ac£u>e(sarah), 2) 



0.69 :: happensAt(wa//cmg(mike), 21) 
0.58 :: happensAt(u!a//cmg(sarah), 21) 
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holdsAt(moving(sarah,mike) = true, 22) 



initiatedAt(move = true, 1), 



initiatedAt(move = true, 21), 



negate,! broken! move = true, 1, 22) ) ... negate2 , bro i cen ( r n OVe = tme , i, 22)) 
negate 2 ( broken! move = true, 1, 22)) negate 2 (broken(move = true, 21, 22)) 



problog_ neg( broken not broken problog_ neg( broken not broken 

(move = true, 1, 22)) (move = true, 1, 22) (move = true, 21, 22)) (move = true, 21, 22) 



fail succeed fail succeed 

Figure 2: SLD tree for query holdsAt(moumg(mike, sarah) =true, 



0.18 :: happensAt(macii?;e(mike), J^i] 
0.32 :: happensAt(walking (sarah) , J^l\ 



At frame 2, the query holdsAt(moumg(mike, sarah) =true, 2) has a probabil- 
ity equal to the probability of the initiation condition of frame 1, which, 



according to rule (11) and given that all coordinate and orientation-related 
information (not shown above) are crisply recognised, is the product of the 
probabilities that both mike and sarah are walking, that is, 0.70x0.46=0.322. 
This is visualised at the far left of Figure [TJ where the LTA's probability 
jumps from to 0.322. From frame 2 to frame 20, mike is still walking but 
sarah is not. Therefore, no initiation or termination conditions are fired and 
the probability of moving remains unchanged. This occurs due to the law of 
inertia and is depicted graphically by the horizontal line between frames 2 
and 20. At frame 21, however, sarah starts walking alongside mike again. Con- 
sequently, at frame 22, the query holdsAt(mo?;mg(mike, sarah) =true, 22) has 
two initiation conditions to consider, one fired at frame 1 and one at frame 



21. This occurs because rule (22) searches over all time-points between time- 
point and the current time-point for initiation conditions, finding both the 
condition fired at frame 1 and the one fired at frame 21. 

As mentioned in the previous section, ProbLog computes the probability 
of a query by first scanning the entire SLD tree of the query. Figure [2] depicts 
a fragment of the SLD tree for the query holdsAt(momng(mike, sarah) = true, 2i 
The second inference step is to represent these proofs as a DNF formula. In 
our case, the DNF is the following: 



initiatedAt(moOTn^(sarah, mike) =true, i) \J initiatedAt(moOTng(sarah, mike) =true, 21] 

initi initgi 



(23) 
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We have simplified the representation by omitting the two relevant 'negate2' 
clauses since, as can be seen in Figure [2j they are both provable through 
negation as failure and therefore have a probability of 1. This occurs because 
no termination conditions for moving have been fired between frames 1 and 
22. 

Up to frame 22, there exist only two initiation conditions for the moving 



LTA, initi and zniig; (see formula (23)). In the general case, there may exist 



many more initiation conditions in the interval between the start of the video 
and the examined video frame. In addition, for every initiation condition, 



rule (22) will check whether the LTA has been terminated by examining the 
interval between the initiation and the current video frame, repeating the 
process at the next video frame. This leads to numerous redundant compu- 
tations. We overcame this problem by implementing an elementary caching 
technique according to which the probability of holdsAt(F = V, T— 1 ) is 
stored in memory and, therefore, holdsAt(F = V, T) simply checks to see 
whether the initiation or termination conditions (if any) fired at time-point 
T—l increase or decrease this probability. This technique operates under the 
assumption that the activity recognition system receives the video frames 
in a temporally sorted manner — this assumption holds in CAVIAR. The 
implementation of this simple caching technique is available with the code 
of our recognition system. 

The third step of ProbLog inference involves translating the DNF into 
a BDD. However, our example is simple enough to allow us to perform 
manual calculations, as there exist only two proofs for holdsAt, which are 



easy to disjoin. The probability of DNF formula (23) can be computed as 



the probability of a disjunction of two elements, as explained in Section [6j 

P(init\ V init2i) = P (initi) + P(init2i) — P (initi A init<i\) = 

= 0.70 x 0.46 + 0.69 x 0.58 - 0.7 x 0.46 x 0.69 x 0.58 = 0.593 

The probability that mike and sarah are moving at frame 22 has increased, 
owing to the presence of the additional initiation condition of frame 21. 
This is one of the characteristics of Prob-EC: the continuous presence of 
initiation conditions of a particular LTA causes an increase of the probability 
of the LTA. This behaviour is consistent with our intuition: given continuous 
indication that an activity has (possibly) occurred, we are more inclined to 
agree that it has indeed taken place, even if the confidence of every individual 
indication is low. For this reason, from frame 22 up to and including frame 
41, the probability of moving increases, as is visible in Figure [T] In this 
example, at frame 41 the activity's probability has escalated to around 0.8. 

At frame 42, Prob-EC has to take into consideration the termination con- 
dition that was fired at frame 41. This termination condition, corresponding 



to rule (12), is also probabilistic: it bears the probability that sarah walked 



away from mike, which, according to rule (12) and the fact that close is 
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Figure 3: ProbLog output concerning holdsAt(momnc/(mike, sarah) = true, T) 
for various video frames T. 



crisply detected, is equal to the probability of the walking STA itself, which 
is 0.32. Therefore, when estimating the probability that, at frame 42, mike 
and sarah are still moving together, we have to incorporate the probability of 
all possible worlds in which sarah did not, in fact, walk away from mike. The 
probability of these worlds is computed by the use of negate2 in rule (22) 
and is equal to 1—0.32=0.68. Therefore, the probability that mike and sarah 
are still moving together at frame 42, in spite of the possible termination 
condition of frame 41, is 0.8x0.68=0.544. Similarly to the steady probability 
increase given continuous initiation conditions, when faced with subsequent 
termination conditions, the probability of the LTA will steadily decrease. 
The slope of the ascent (descent) is defined by the probability of the initi- 
ation (termination) conditions involved. For this example, we assume that 
sarah keeps walking away from mike until the end of the video, causing the 
LTA's probability to approximate 0, as shown at the far right of Figure [T] 

Figure [3] shows the precise ProbLog output for our example. After an 
abrupt jump from to 0.322, the probability remains stable between frames 
2 and 21, indicating that for this period of time the LTA persists through 
the law of inertia. Between frames 22 and 41, the LTA's probability mono- 
tonically increases, reflecting the repeated initiations of moving that occur 
during that time. After frame 41, it decreases, reflecting the repeated termi- 
nation conditions occurring at the same time period. The dashed horizontal 
line at probability 0.5 represents the recognition threshold that we use to 
discern between LTA positives that we consider to be trustworthy enough 
- these are the LTA recognitions — and those that we do not. In other 
domains, recognition thresholds different from 0.5 may be used. 

This example presented of a LTA that is recognised after a certain 

number of subsequent initiations. We refer to LTA that may have an occur- 
rence initiated more than once as weakly initiated. Additionally, there exist 
LTA who are strongly initiated, that is, a LTA occurrence is associated with 
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Figure 4: Example of LTA probability fluctuation in the presence of strong 
initiation. 

only one initiation condition and the occurrence persists through the law of 
inertia alone, leaving -object is one such LTA, since it does not make much 
sense to assume that a person might be leaving an object for consecutive 
video frames. 

The impact of strong initiation on Prob-EC's recognition is significant. 
This is due to a lack of repeated initiations which, as explained in the previ- 
ous example, cause Prob-EC to augment the LTA's probability. The proba- 
bility of a strongly initiated LTA is entirely dependent on the probability of 
its single initiation condition. If this probability is above the imposed recog- 
nition threshold (0.5 in our case), then the video frames during which the 
LTA persists will all count as recognitions. This behaviour can be demon- 
strated by another example — see Figure |4j Assume that sarah is walking 
while simultaneously carrying a suitcase for 10 video frames. At frame 11, she 
leaves the suitcase on the floor and walks away from it. This causes the suit- 
case to 'appear' in the low-level tracking system, triggering the leaving -object 
initiation condition expressed by rule ([5]). Suppose that this initiation con- 
dition has a probability of 0.6. At frame 20, sarah picks up the suitcase, 
causing it to 'disappear' and triggering termination condition expressed by 
rule ([6]). Suppose that the termination condition also has a probability of 
0.6. As can be seen in Figure |4j in the absence of any initiations after frame 
11, the LTA persists entirely due to the law of inertia. The probability of 
this LTA is equal to the probability of the single initiation condition, that 
is, 0.6. Because this probability is above the recognition threshold of 0.5, 
all frames taking place until sarah picks up the suitcase will be counted as 
recognitions. If the probability of the initiation condition was below the 0.5 
threshold then leaving -object would not have been recognised. 

Note that in Prob-EC it is possible to transform every strongly initiated 
LTA into a weakly initiated version of the same LTA, by considering every 
frame in which the strongly initiated LTA holds as an initiation condition 
for the weakly initiated version. The following rule shows how this could 
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have been implemented in the case of leaving .object: 



initiated At(leaving.object.weakly.initiated(P, Obj)= true, T) <— . , 
ho\dsAt(leaving_object(P, Obj)=true, T) 

While this transformation would have been beneficial for our experiments 
in the CAVIAR dataset, since we would be able to augment the probability 
of the weakly initiated version of leaving .object through multiple initiations 
and eventually surpass the 0.5 threshold, it introduces some subtle perils. 
Consider, for example, a scenario in which a leaving .object LTA is initiated 
with a very small probability, such as 0.01, indicating that the sensor's con- 
fidence about the STA comprising rule ^ is very low. Prob-EC will then 
compute that leaving. object holds with a probability of 0.01 up until the 
video frame, if any, where the object in question is picked up. These positives 
will be correctly discarded given their very small probabilities. The weakly 



initiated version of leaving -object, however (see rule (24)), will continue to 
augment the probability of this LTA, eventually surpassing the threshold of 
0.5 and thus producing a potentially large number of False Positives (FP), 
particularly if the video frames during which the object is left on the floor 
are numerous. While it is true that such situations do not arise in CAVIAR, 
it is very possible that they might take place in typical activity recogni- 
tion applications, particularly where an emphasis on tracking unattended 
packages is given. 

Similar to being weakly or strongly initiated, a LTA may be weakly or 
strongly terminated. For instance, moving is weakly terminated, as it may 
be terminated at consecutive video frames, while leaving. object is strongly 
terminated at the frame in which a person picks up the object. In general, 
there is no fixed relation between the initiation and termination of a fluent 
— for example, a LTA may be weakly initiated and strongly terminated. In 
the case of strong termination, that is, in the absence of subsequent termi- 
nations, the probability of a LTA may not drop below the (0.5) recognition 
threshold, thus possibly resulting in false persistence. 



8 Experimental Results 

We experimented on the 28 surveillance videos of the CAVIAR dataset which 
contain, in total, 26419 video frames. These frames have been manually an- 
notated by the CAVIAR team to provide the ground truth for STA and 
LTA (we performed very minor editing of the annotation in order to intro- 
duce a STA for abrupt motion). Table [2] shows Crisp-EC's and Prob-EC's 
results. These results have been produced by computing queries of the form 
holdsAt(ZT^4 =true, T). Due to our implementation of Prob-EC's negation 
through the negatei and negate2 predicates (see Section [7]), Prob-EC has 
identical results to Crisp-EC. 
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LTA 


TP 


FP 


FN 


Precision 


Recall 


F-measure 


meeting 


3099 


1910 


525 


0.619 


0.855 


0.718 


moving 


4008 


2162 


2264 


0.650 


0.639 


0.644 


fighting 


531 


97 


729 


0.421 


0.845 


0.562 


leaving object 


143 


1539 


55 


0.085 


0.722 


0.152 



Table 2: True Positives (TP), False Positives (FP), False Negatives (FN), 
Precision, Recall and F-measure for Crisp-EC and Prob-EC on CAVIAR 
without artificial noise. 

Particularly notable in the results is the low Precision for the leaving -object 
LTA, owing to a substantial number of False Positives (FP). This is due to 
the problematic annotation of CAVIAR with respect to this LTA. For exam- 
ple, in video 14, the object that is left at frame 946, triggering an initiation 
of leaving. object, is picked up ('disappears') at frame 1354. However, the 
relevant annotation mistakenly reports that leaving ^object stops occurring 
at frame 996. We therefore end up with 358 FP which could have been 
avoided with a more consistent annotation of this video. Similarly, in the 
annotation of videos 17 and 18, a large number of annotated frames are 
missing. Video 16 includes a particularly interesting case of leaving ^object. 
In this video, a person leaves a bag next to a chair, exits the scene, re-enters 
after a couple of seconds and picks up the bag. When the person re-enters, 
he is assigned a new identifier (this is common in CAVIAR). Various com- 
plications arise due to this. Firstly, the original leaving _object activity is 
not terminated by our rules when the person in question 'disappears'. This 
is deliberate on our part: we choose to terminate leaving .object when the 
object is picked up rather than when the person that leaves it 'disappears' 
from the sensor's point of view. We thus emphasize on time-points in which 
a package might be unattended. The CAVIAR annotation, however, views 
the leaving .object LTA from a different perspective and thus assumes that 
the activity is terminated when the person 'disappears'. This difference in 
perspective leaves us with a substantial number of FP, one for each frame 
that the person is not present in the scene. When the person re-enters the 
scene, CAVIAR provides the person with a new identifier and resumes the 
annotation of leaving. object with a <new.person.id, same.object.id> tuple. 
After a couple of frames, the person (described by nevj-personJd) picks 
up the object, terminating a leaving _object(new .person .id, same. object. id) 
LTA occurrence which Crisp-EC never initiated in the first place. Thus, in 
addition to a substantial number of FP, we also generate 55 False Negatives 
(FN) (see Table[2]) because Crisp-EC never recognises the new leaving ^object 
activity. 

According to the manual annotation of the CAVIAR dataset, all STA are 



21 



associated with a probability of 1 . In real- world activity recognition applica- 
tions it is unrealistic to assume that STA will be detected with certainty. In 
order to experiment with a more realistic setting, we added artificial noise 
to the dataset, in the form of probabilities attached to the detected STA. 
Toward this end, we used a Gamma distribution with a varying mean in 
order to represent different levels of noise. 

Our experimental procedure may be summarised as follows. 

• We use two different approaches for adding probabilities to the input 
facts: 

1. We add probabilities to STA only ('smooth noise'). 

2. In addition to STA, we add probabilities to their associated co- 
ordinate and orientation fluents ('strong noise'). STA are not re- 
quired to have the same probability as their associated coordinate 
and orientation fluents. 

Thus we end up with two different noisy versions of CAVIAR. 

• We feed these data to Prob-EC and filter its output — which is a series 
of positives of the form Prob :: holdsAt(LT^4 = true, T) — to keep only 
the positives with probability of 0.5 and above, indicating that we 
trust these positives to be accurate. 

• We filter both noisy versions of the dataset by erasing the facts with 
probability below 0.5. We retain the facts with probability above 0.5, 
removing their probability values. Thus we assume that such facts 
have been tracked with certainty. For 'smooth noise', this step means 
that we remove a certain amount of STA, whereas for 'strong noise' 
we additionally remove coordinate and orientation fluents. 

• We give these filtered versions of CAVIAR as input to Crisp-EC. With 
this step we aim to estimate the impact of environmental noise in 
Crisp-EC, by assuming that Crisp-EC can only reason on facts that 
have been tracked with relative certainty, that is, facts with probability 
above 0.5. 

• We compare the performance of Crisp-EC and Prob-EC. 

We repeated the above experimental procedure 16 times, once for each 
Gamma distribution mean value between 0.5 and 8.0 inclusive, with a step 
of 0.5. Noise is added randomly; for example, a STA that is erased from 
Crisp-EC's input for Gamma mean 6.0 might be present at the dataset 
produced for Gamma mean 6.5. In our implementation, the higher the mean, 
the lower the probabilities attached to the ground input facts, indicating 
a higher amount of noise. The impact of the gradual increase of noise in 
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Figure 5: Occurrences of walking STA in CAVIAR per Gamma mean value. 
Mean value 0.0 represents the CAVIAR dataset without artificial noise. 



CAVIAR's STA can be seen in Figure [5j where the number of occurrences 
of the STA walking with probability above 0.5 is plotted. The amount of 
walking occurrences drops exponentially as we increase the level of noise. 
All other STA follow the same pattern. In the case of 'strong noise', the 
occurrences of coordinate and orientation fluents also drop exponentially. 

Although the noise introduced may seem extreme (under high Gamma 
mean values), it is actually quite realistic. In activity recognition from video 
content, STA usually have very low probabilities — video tracking systems 
very rarely produce STA with high confidence]^] 

The loss of STA in the input of Crisp-EC can have a severe impact on 
its recognition accuracy. Certain STA or associated fluents might define ini- 
tiation conditions that would have led to the recognition of LTA, if the STA 
had not been erased. This may lead to many FN for Crisp-EC. Termination 
conditions might also not be satisfied, leading to a false persistence of LTA 
which may create FP. Given that Crisp-EC makes mistakes in the original 
dataset (that is, CAVIAR without artificial noise), it may occur that a FP in 
the original dataset becomes a True Negative (TN) in a noise-altered version 
of CAVIAR ('smooth' or 'strong' noise). Similarly for FN. To demonstrate 
this with an example, assume that two people are moving along together on 
a pavement. Suddenly they have to step aside to allow a handicapped per- 
son full access to the pavement. In Crisp-EC, this would fire the 'walk away' 



termination condition (see rule (12)) because the distance between the two 
people would exceed the pre-specified threshold. This does not mean that 
the people have stopped moving — they merely had to distance themselves 
momentarily. Firing — erroneously — the 'walk away' termination condi- 
tion creates FN for some video frames. If, however, the walking STA is not 



Refer, for example, to the Mind's Eye dataset: http://www.visint.org/datasets. 
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Figure 6: Crisp-EC and Prob-EC F- measure per Gamma mean value for all 
LTA under 'smooth noise'. 



tracked during the frames at which the people make way to the handicapped 
person, moving will never be terminated, which, in this special case, adds 
True Positives (TP) to the evaluation. 

8.1 'Smooth Noise' Experiments 

Figure [6] compares the recognition accuracy of Crisp-EC and Prob-EC on 
all LTA of interest in terms of F-measure, under 'smooth noise'. In all ex- 
periments, we plot the F-measure per Gamma distribution mean averaged 
over 5 different runs, that is, each point in the diagram is the average of 
5 different F-measure values. The vertical error bars display the standard 
deviations. 

The recognition accuracy of Prob-EC is higher than that of Crisp-EC 
in meeting, moving and fighting. In meeting (see Figure (6^a)), the accu- 
racy of Crisp-EC starts falling from Gamma mean 5.0 onwards, because the 
active and inactive STA tend to be erased from its input, since they receive 
probability below 0.5 when we generate noise. Prob-EC, on the other hand, 
is able to initiate meeting with a certain degree of probability (recall that 
rules ^ and ^ express the conditions in which meeting is initiated). Af- 



ter a number of frames, the repeated initiation of meeting leads to a holdsAt 
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probability of 0.5 or above and thus Prob-EC eventually recognises meeting 
(refer to the first example of Section [7] for a detailed explanation of this 
behaviour). Those initiation conditions occur very frequently in CAVIAR. 
Therefore, meeting ends up being recognised with a probability close to 1. 

In the case of moving (see Figure [6^b)), it is the loss of multiple occur- 
rences of walking (due to noise) that causes Crisp-EC to suffer from numer- 
ous FN (see rule ( |11[ ) for the initiation condition of moving). Prob-EC, on 
the other hand, uses repeated initiation to eventually surpass the recognition 
threshold of 0.5. The recognition accuracy of Crisp-EC in moving deterio- 
rates earlier than that of meeting, falling behind Prob-EC from Gamma 
mean 2.0. This occurs because the initiation condition of moving bears two 
probabilistic conjuncts in its body, corresponding to two separate occur- 
rences of the walking STA. Therefore, it is affected by noise twice. The 
initiation conditions of meeting may sometimes have two probabilistic con- 
juncts, but usually they have just one — the negatei conditions are usually 
crisp. 

Similar to moving and meeting, Prob-EC fares better than Crisp-EC 
in fighting (see Figure j6^c)) . For high levels of noise, that is, high Gamma 
mean values, Crisp-EC has trouble initiating fighting (see rule (13)), whereas 
Prob-EC uses repeated initiation to surpass the recognition threshold of 
0.5. Prob-EC manages to surpass this threshold despite the relatively short 
duration of fights, compared to other LTA. 

leaving .object (see Figure [6^d)) is an interesting special case, owing to 
the LTA's implicit strong initiation. Recall from the discussion of Section [7] 
that, by strong initiation, we refer to the case where a LTA is recognised 
through a single occurrence of its initiation conditions and persists solely 
through the law of inertia. In the CAVIAR videos, a person leaves an object 
after a few frames in which he is walking. In rare situations it is possi- 
ble that either Crisp-EC or Prob-EC miss the recognition of person due to 
noise and, consequently, the recognition of leaving -object. Such cases did not 
arise in our 'smooth noise' experiments. In these experiments, Prob-EC and 
Crisp-EC are equally accurate. When person is recognised by Crisp-EC, it 
is recognised with a sufficiently high probability by Prob-EC and vice versa. 
As mentioned in Section [7j we could have achieved better performance with 
Prob-EC on leaving ^object in CAVIAR by introducing a weakly initiated 
version of the LTA, but doing so could affect the performance of our frame- 
work with respect to leaving ^object on other activity recognition settings. 

LTA in CAVIAR are usually terminated when an entity 'disappears' or 
when people walk away from each other. In the latter case, the probability 
of the LTA termination depends solely on that of walking (see, for example, 
rule (12)). In general, probabilistic terminations are similar to probabilistic 
initiations. When a termination condition is repeatedly fired with probabil- 
ities below 0.5 then Prob-EC eventually stops recognising the LTA, whereas 
Crisp-EC does not, producing FP. 



25 




0,9 



0) 0,7 





Fighting 














T»-Crisp-EC 






-■-Prob-EC 



. 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 

Gamma Distribution Mean 

(c) 



0.9 
0.8 
0.7 
" 0.6 
3 0.5 
3 0.4 
E 0.3 



Leaving Object 



U.2 - -r 



-Crisp-EC 
-Prob-EC 



0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 

Gamma Distribution Mean 

(d) 



Figure 7: Crisp-EC and Prob-EC F-measure per Gamma mean value for all 
LTA under 'strong noise'. 



8.2 'Strong Noise' Experiments 

Figure [7] depicts our results under 'strong noise'. Compared to 'smooth 
noise', both Prob-EC and Crisp-EC were more severely affected. Prob-EC 
outperforms Crisp-EC in meeting (see Figurej^a)) even at lower noise levels, 
that is, lower Gamma mean values, compared to the 'smooth noise' experi- 
ments. This occurs because, in addition to losing the active and inactive STA 
due to noise, Crisp-EC now has trouble proving that entities are close, be- 
cause under the strong noise assumption we also remove coordinate-related 
information, required to compute the distance between two entities. Thus, 
even in cases where the active or inactive STA are present and indicate that 
the relevant frame might be an initiation condition for meeting, Crisp-EC is 
unable to prove that the two entities are close enough to initiate the LTA. 
Prob-EC, on the other hand, uses repeated initiation to eventually break 
the 0.5 barrier and produce recognitions. 

Concerning moving (see Figure 13b)), Crisp-EC suffers again from the 
loss of the walking STA. Under 'strong noise', this conclusion is more strik- 
ing: after Gamma mean 3.5, Crisp-EC is unable to produce a single positive. 
This is because, in addition to the walking STA and associated coordinate 
fluents being erased from Crisp-EC's input, orientation fluents (which affect 
rule (11)) are also candidates for removal. As a result, even in cases where 
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Figure 8: Probabilities of true and false positives for a Prob-EC moving 
experiment at Gamma mean 2.5. 

both entities potentially involved in a moving activity are believed to be 
walking close to each other by the low-level tracking system, the absence of 
orientation information leads to an inability to initiate moving. 

What is perhaps more interesting is that Prob-EC performs worse than 
Crisp-EC in moving. This is because whenever faced with an initiation con- 
dition for moving, Prob-EC has to calculate the probability of this initi- 
ation condition as a product of 6 probabilities in total. (Recall from rule 
([7]) that close is defined in terms of two coordinate input facts. It there- 
fore contributes as the product of 2 probabilities.) Consequently, Prob-EC 
has trouble surpassing the 0.5 recognition threshold we require to trust its 
positives. Even in cases of near-certainty about some conditions of the in- 
put, ProbLog's independence assumption leads to very low probability val- 
ues. Consider, for example, two entities, idi and id,2, whose STA (in our 
case, walking) and associated information (coordinates, orientation) are all 
tracked with a probability of 0.8. Whereas Crisp-EC will be able to ini- 
tiate moving, since all the facts will make their appearance in the input, 
Prob-EC will produce an initiation condition probability of (0.8) 6 =0.262. 
Consequently, we will not trust the probability of the relevant holdsAt query. 
Furthermore, due to the fact that each initiation condition has usually a very 
low probability, repeated initiation does not overcome the 0.5 threshold. This 
phenomenon is graphically depicted in Figure [8| where we plot the proba- 
bilities of all positives for one experiment concerning moving, with Gamma 
mean 2.5. In this case, positives are ground ho\dsAt(moving(IDl , ID2) = true, 
T) produced by Prob-EC. Some of these positives are true — see Figure[8ja) 



- and the rest are false — see Figure [8[b). In these figures, the horizontal 

axis represents the index of the positives. The index corresponds to the or- 
der in which a positive was generated, that is, the first positive of the video 
that was evaluated first has index 1, the first positive in the second video 
might have index 300, and so on. As shown in the graphs, not one posi- 
tive of Prob-EC gains a probability above 0.5. This is despite the increasing 
probability of consecutive positives that is observed in some cases, which is 
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due to repeated initiation. Therefore, both Precision and Recall, and thus 
F-measure, are equal to 0. 

With respect to fighting (see Figure [7]^c)), Prob-EC outperforms Crisp- 
EC for certain Gamma mean values. For high Gamma mean values, Prob-EC 
and Crisp-EC are equally inaccurate. In the case of 'strong noise', there are 
at least three probabilistic conjuncts per initiation condition. Due to the 
fact that fights have a relatively short duration (much shorter than meeting, 
for example), there are not enough consecutive initiations to raise Prob- 
EC's probability above the 0.5 threshold (in the case of high Gamma mean 
values). 

Regarding leaving _object (see Figuregd)), Crisp-EC seems to fair slightly 
better than Prob-EC for low levels of noise, that is, low Gamma mean val- 
ues. Under the 'strong noise' assumption, Prob-EC has to consider more 
probabilistic conjuncts per initiation condition, due to the presence of close. 
As a result, and given the strong initiation of leaving ^object, Prob-EC tends 
to produce probabilities below 0.5 for this LTA, even in cases where the 
STA probabilities themselves might be above 0.5 and therefore sufficient to 
allow Crisp-EC to recognise leaving .object. For higher noise levels (Gamma 
mean values), Prob-EC and Crisp-EC are equally inaccurate, because the 
data required by Crisp-EC to initiate leaving _object, whether it is the person 
fluent, the inactive STA or the coordinate fluents of the entities involved, 
are missing. 

8.3 Conclusions 

Our experimental evaluation showed that Prob-EC outperforms Crisp-EC 
when: 

• a LTA is weakly initiated, and 

• the LTA depends on a small number of probabilistic conjuncts. 

Prob-EC outperforms Crisp-EC in the aforementioned conditions from av- 
erage noise levels (Gamma mean values) to high ones. (In low noise levels 
(Gamma mean values) Prob-EC is at least as accurate as Crisp-EC.) More- 
over, Prob-EC outperforms Crisp-EC even when the duration of a LTA is 
relatively short, that is, when there is a small number of repeated initiations. 

When a LTA depends on a large number of probabilistic conjuncts, Prob- 
EC may outperform Crisp-EC, but this is not always the case. ProbLog 
makes an independence assumption about input facts and thus the product 
of the probabilities of many probabilistic conjuncts may be very small, even 
if the probability of each individual conjunct is high. The greater the number 
of probabilistic conjuncts, the more initiations Prob-EC requires to surpass 
the (0.5) recognition threshold. In the presence of low noise levels, Crisp-EC 
is at least as accurate as Prob-EC, whereas when the noise levels are high 
Prob-EC is at least as accurate as Crisp-EC. 
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In the case of strong initiation, there are situations in which Prob-EC 
may fare significantly better than Crisp-EC and vice versa, but these did 
not arise in our experiments. In general, Prob-EC is expected to have similar 
performance to Crisp-EC on strongly initiated LTA in the case of a small 
number of probabilistic conjuncts, while Crisp-EC is likely to perform better 
in the case of a large number of such conjuncts. 

The experimental results concerning weak/strong termination are sim- 
ilar to those regarding weak/strong initiation. For example, Prob-EC out- 
performs Crisp-EC in the case of weakly terminated LTA that depend on 
a small number of probabilistic conjuncts. The repeated termination allows 
Prob-EC to stop recognising a LTA when Crisp-EC continues the recognition 
producing FP. 

Finally, we note that Prob-EC is at least as accurate as Crisp-EC in 
the presence of high noise levels, considering any number of probabilistic 
conjuncts and any combination of weak/strong initiation/termination. Con- 
sequently, we expect that Prob-EC will fare better in real-world recognition 
applications where sensor data are highly noisy. 

9 Summary and Future Work 

We presented Prob-EC, an Event Calculus dialect for probabilistic rea- 
soning. We performed extensive experimental evaluation of Prob-EC on a 
benchmark activity recognition dataset. The evaluation showed the condi- 
tions in which Prob-EC outperforms a crisp Event Calculus. Prob-EC is the 
first Event Calculus dialect able to deal with uncertainty in the input STA. 
Moreover, this is the first approach that thoroughly evaluates the Event 
Calculus in a probabilistic framework. 

There are several directions for further work. First, we intend to exper- 
iment with additional types of noise — consider, for example, the case in 
which the low-level tracking system 'confuses' two STA. Second, we aim to 
extend Prob-EC in order to support intervals as described in [3]. We intend 
to allow the computation of queries of the form 'what was the probability 
that mike and sarah were moving continuously during the interval [21, 41]^ 
- see Figure [TJ Third, we aim to incorporate advanced caching techniques 
in Prob-EC in order to make it suitable for run-time activity recognition, 
where STA arrive with a variable delay and where there is a need to dy- 
namically update the intervals of already recognised LTA. Finally, we aim 
to address the issue of imprecise LTA definitions. We intend to combine the 
reasoning engine of Prob-EC with that of the MLN-based Event Calculus 
|37j in order to provide a unified framework that will be able to deal with 
both STA detection probabilities and imperfect LTA definitions. 



29 



Acknowledgements 

This work was supported by the EU PRONTO Project (FP7-ICT 231738). 
We are very grateful to the ProbLog developing team whose feedback con- 
tributed greatly to our grasp of the language. The authors themselves, how- 
ever, are solely responsible for any misunderstanding about its use. We have 
also benefited from discussions with Marek Sergot on the Event Calculus. 

References 

[1] J. Allen. Maintaining knowledge about temporal intervals. Communi- 
cations of the ACM, 26(ll):832-843, 1983. 

[2] A. Artikis, G. Paliouras, F. Portet, and A. Skarlatidis. Logic-based 
representation, reasoning and machine learning for event recogni- 
tion. In Proceedings of Conference on Distributed Event-Based Systems 
(DEBS), pages 282-293. ACM Press, 2010. 

[3] A. Artikis, M. Sergot, and G. Paliouras. A logic programming approach 
to activity recognition. In Proceedings of ACM Workshop on Events in 
Multimedia, 2010. 

[4] A. Artikis, A. Skarlatidis, and G. Paliouras. Behaviour recognition from 
video content: A logic programming approach. International Journal 
of Artificial Intelligence Tools, 19(2):193-209, 2010. 

[5] Rahul Biswas, Sebastian Thrun, and Kikuo Fujimura. Recognizing ac- 
tivities with multiple cues. In Ahmed M. Elgammal, Bodo Rosenhahn, 
and Reinhard Klette, editors, Workshop on Human Motion, volume 
4814 of Lecture Notes in Computer Science, pages 255-270. Springer, 
2007. 

[6] Matthew Brand, Nuria Oliver, and Alex Pentland. Coupled hidden 
markov models for complex action recognition. In CVPR, pages 994- 
999. IEEE Computer Society, 1997. 

[7] W. Brendel, A. Fern, and S. Todorovic. Probabilistic event logic 
for interval-based event recognition. In Computer Vision and Pat- 
tern Recognition (CVPR), 2011 IEEE Conference on, pages 3329-3336. 
IEEE, 2011. 

[8] Maurice Bruynooghe, Theofrastos Mantadelis, Angelika Kimmig, Bernd 
Gutmann, Joost Vennekens, Gerda Janssens, and Luc De Raedt. 
Problog technology for inference in a probabilistic first order logic. In 
ECAI 2010 - 19th European Conference on Artificial Intelligence, 2010. 



30 



[9] R. Bryant. Graph-based algorithms for boolean function manipulation. 
IEEE Transactions on Computers, 35(8):677-691, 1986. 

[10] G. Cugola and A. Margara. Processing flows of information: From data 
stream to complex event processing. ACM Computing Surveys, 2011. 

[11] C. Dousson and P. Le Maigat. Chronicle recognition improvement us- 
ing temporal focusing and hierarchisation. In Proceedings of Interna- 
tional Joint Conference on Artificial Intelligence (IJCAI), pages 324- 
329, 2007. 

[12] Daan Fierens, Guy Van den Broeck, Ingo Thon, Bernd Gutmann, and 
Luc De Raedt. Inference in probabilistic logic programs using weighted 
cnf's. In Proceedings of the 27th Conference on Uncertainty in Artificial 
Intelligence (UAI), July 2011, 2011. 

[13] Matthew L. Ginsberg. Bilattices and modal operators. In TARK, pages 
273-287, 1990. 

[14] S. Gong and T. Xiang. Recognition of group activities using dynamic 
probabilistic networks. In Computer Vision, 2003. Proceedings. Ninth 
IEEE International Conference on, pages 742-749. IEEE, 2003. 

[15] A. Hakeem and M. Shah. Learning, detection and representation of 
multi-agent events in videos. Artificial Intelligence, 171(8-9):586-605, 
2007. 

[16] Rim Helaoui, Mathias Niepert, and Heiner Stuckenschmidt. Recog- 
nizing interleaved and concurrent activities: A statistical-relational ap- 
proach. In PerCom, pages 1-9. IEEE, 2011. 

[17] S. Hongeng and R. Nevatia. Large-scale event detection using semi- 
hidden markov models. In Computer Vision, 2003. Proceedings. Ninth 
IEEE International Conference on, pages 1455-1462. IEEE, 2003. 

[18] Aniruddha Kembhavi, Tom Yeh, and Larry S. Davis. Why did the 
person cross the road (there)? scene understanding using probabilistic 
logic models and common sense reasoning. In ECCV (2), pages 693- 
706, 2010. 

[19] A. Kimmig, B. Demoen, L. De Raedt, V. Santos Costa, and R. Rocha. 
On the implementation of the probabilistic logic programming language 
problog. Theory and Practice of Logic Programming, 2010. 

[20] D. Kosmopoulos, P. Antonakaki, K. Valasoulis, A. Kesidis, and 
S. Perantonis. Human behaviour classification using multiple views. 
In Proceedings of Hellenic Conference on Artificial Intelligence, volume 
5138. Springer, 2008. 



31 



[21] R. Kowalski and M. Sergot. A logic-based calculus of events. New 
Generation Computing, 4(l):67-96, 1986. 

[22] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Con- 
ditional random fields: Probabilistic models for segmenting and labeling 
sequence data. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, 
editors, ICML, pages 282-289. Morgan Kaufmann, 2001. 

[23] L. Liao, D. Fox, and H. Kautz. Hierarchical conditional random fields 
for gps-based activity recognition. Robotics Research, pages 487-506, 
2007. 

[24] Vlad I. Morariu and Larry S. Davis. Multi-agent event recognition 
in structured scenarios. In Computer Vision and Pattern Recognition 
(CVPR), 2011. 

[25] K.P. Murphy. Dynamic Bayesian Networks: representation, inference 
and learning. PhD thesis, University of California, 2002. 

[26] P. Natarajan and R. Nevatia. Hierarchical multi-channel hidden semi 
markov models. In Proc. International Joint Conference on Artificial 
Intelligence (IJCAF07), pages 2562-2567, 2007. 

[27] A. Paschke. ECA-RuleML: An approach combining ECA rules with 
temporal interval-based KR event/action logics and transactional up- 
date logics. Technical Report 11, Technische Universitat Miinchen, 
2005. 

[28] A. Paschke and M. Bichler. Knowledge representation concepts for 
automated SLA management. Decision Support Systems, 46(1):187- 
205, 2008. 

[29] A. Paschke and A. Kozlenkov. Rule-based event processing and reaction 
rules. In Proceedings of RuleML, volume LNCS 5858, pages 53-66. 
Springer, 2009. 

[30] L. Rabiner and B. Juang. An introduction to hidden markov models. 
ASSP Magazine, IEEE, 3(1):4-16, 1986. 

[31] Matthew Richardson and Pedro Domingos. Markov logic networks. 
Machine Learning, 62(1-2):107-136, 2006. 

[32] A. Sadilek and H. Kautz. Location-based reasoning about complex 
multi-agent behavior. Journal of Artificial Intelligence Research, 43:87- 
133, 2012. 



32 



[33] J. Selman, M. Amer, A. Fern, and S. Todorovic. Pel-cnf: Probabilistic 
event logic conjunctive normal form for video interpretation. In Com- 
puter Vision Workshops (ICCV Workshops), 2011 IEEE International 
Conference on, pages 680-687. IEEE, 2011. 

[34] V. Shet, D. Harwood, and L. Davis. VidMAP: video monitoring of 
activity with Prolog. In Proceedings of International Conference on 
Advanced Video and Signal Based Surveillance (AVSS), pages 224-229. 
IEEE, 2005. 

[35] V. Shet, J. Neumann, V. Ramesh, and L. Davis. Bilattice-based logical 
reasoning for human detection. In Proceedings of International Confer- 
ence on Computer Vision and Pattern Recognition (CVPR), pages 1-8. 
IEEE, 2007. 

[36] J.M. Siskind. Grounding the lexical semantics of verbs in visual per- 
ception using force dynamics and event logic. Journal of Artificial In- 
telligence Research, 15:31-90, 2001. 

[37] Anastasios Skarlatidis, Georgios Paliouras, George A. Vouros, and 
Alexander Artikis. Probabilistic event calculus based on markov logic 
networks. In RuleML America, pages 155-170, 2011. 

[38] Son D. Tran and Larry S. Davis. Event modeling and recognition using 
markov logic networks. In ECCV '08: Proceedings of the 10th European 
Conference on Computer Vision, pages 610-623, 2008. 

[39] D.L. Vail, M.M. Veloso, and J.D. Lafferty. Conditional random fields 
for activity recognition. In Proceedings of the 6th international joint 
conference on Autonomous agents and multiagent systems, page 235. 
ACM, 2007. 

[40] L. G Valiant. The complexity of enumeration and reliability problems. 
SI AM Journal on Computing, 8:410-421, 1979. 

[41] J. Wang and P. Domingos. Hybrid markov logic networks. In Proceed- 
ings of the 23rd national conference on Artificial intelligence, volume 2, 
pages 1106-1111, 2008. 

[42] T. Wu, C. Lian, and J.Y. Hsu. Joint recognition of multiple concurrent 
activities using factorial conditional random fields. In Proc. 22nd Conf. 
on Artificial Intelligence (AAAI-2007), 2007. 



33 



